AFCON 2O24 MATCH ANALYSIS: A data driven exploration of team performance and scoring trends across both halves of a game¶

This project takes a closer look at the performance of teams during the two halves of the game. Scroing trends of teams and results in both halves will be analysed, as well as match outcomes.

This project provides an in-depth analysis of the Africa Cup of Nations (AFCON) football tournament, focusing on team performance, scoring patterns, and match outcomes. Using data sourced through web scraping from sofascore, this project aims to identify trends in goals scored across both halves, assess team performance in different match phases, and explore factors contributing to match outcomes.

The dataset includes detailed information on match dates, team names, goals scored by each team in the first and second halves, first and second half results and the full-time results. Additional data cleaning steps were conducted using Microsoft Excel for accurate score correction and to ensure data integrity.

With a focus on delivering valuable insights, the project showcases interactive and visually engaging data representations using Plotly in Jupyter Notebook.

Data Collection¶

In this section, we detail the process used to collect data for the Africa Cup of Nations (AFCON) football tournament. The data was retrieved using web scraping via an API provided by Sofascore, which allows access to real-time match statistics. The process involves sending requests to a specific API endpoint, parsing the returned data, and cleaning it for further analysis.

In [1]:
# importing the needed libraries

import requests

import json

import csv

import pandas as pd

import plotly.express as px

import matplotlib.pyplot as plt

import numpy as np

import seaborn as sns

import plotly.graph_objects as go

from plotly.subplots import make_subplots

%matplotlib inline
In [2]:
import plotly.io as pio
pio.renderers.default = 'notebook'
In [ ]:
# Establishing a connection with the website

response = requests.get("https://api.sofascore.com/api/v1/event/11761888")

if response.status_code == 200:

    print(response.json())  # This will print the JSON response
    
else:

    print("Failed to retrieve data")
In [ ]:
afcon_url = "https://www.sofascore.com/api/v1/unique-tournament/270/season/56021/team-events/total"

response = requests.get(afcon_url)

if response.status_code == 200:
    # Parse JSON data

    afcon_data = response.json()

    print(afcon_data)  # Display the raw JSON data for inspection

else:
    
    print(f"Failed to retrieve data: {response.status_code}")

The data needed for the project was extracted from the json file.

In [ ]:
# a dictionary to store the match results
matches_data = []

# going through all the values(the matches)
for matches_dict in afcon_data.values():
    
    # doing same for the keys that identify the groups and teams
    for group_key, team_games in matches_dict.items():
        
        # going through every game played by the teams 
        for team_game in team_games:
          
          # finding the number of games played by each team 
          for game_num in range(len(matches_dict[group_key][team_game])):

            # storing every game in a variable and extracting the needed information
            game = matches_dict[group_key][team_game][game_num]
            
            match_info = {
               
                'game_id': game['id'],

                'Date': game['startTimestamp'],
                
                'group_name': game['tournament']['name'],  # Extract group name
                
                'home_team': game['homeTeam']['name'],  # Home team name

                'away_team': game['awayTeam']['name'],  # Away team name

                'home_goals_ht': game['homeScore']['period1'],  # Home team goals at half-time

                'away_goals_ht': game['awayScore']['period1'],  # Away team goals at half-time

                'home_goals_2nd_half': game['homeScore']['period2'],  # Home team goals in the second half

                'away_goals_2nd_half': game['awayScore']['period2'],  # Away team goals in the second half

                'home_goals_ft': game['homeScore']['normaltime'],  # Full-time home goals
                
                'away_goals_ft': game['awayScore']['normaltime'],   # Full-time away goals
                
            }

            matches_data.append(match_info)
In [ ]:
# creating a csv file to store the matches

csv_file = 'afcon_group_stage_2024.csv'

# the headers for the csv file

fields = ['group_name','game_id','Date', 'home_team', 'away_team', 'home_goals_ht', 'away_goals_ht', 
          'home_goals_2nd_half', 'away_goals_2nd_half', 'home_goals_ft', 'away_goals_ft',]


with open(csv_file, mode='w', newline='') as file:
    
    writer = csv.DictWriter(file, fieldnames=fields)
    
    # Write the header
    writer.writeheader()
    
    # Write each match's data
    for match in matches_data:
        writer.writerow(match)

Microsoft excel was used to correct the inaccuracy of the match results. Matches that appeared more than once were removed. Additional columns were also created for the number of goals scored in each half,the result of eacah half and the number of goals scored in the game. The date was changed from the json format to the normal readable format.

In [4]:
afcon_2024_group_stage = pd.read_csv(r"C:\Users\Felix\Documents\python practice\data science projects\AFCON_2024_ANALYSIS\Files\AFCON GROUP STAGE GAMES 2023.csv")
In [5]:
# Trimming the group name

afcon_2024_group_stage['group_name']= afcon_2024_group_stage['group_name'].str.replace(r'^Africa Cup of Nations, ', '',regex=True)
In [6]:
afcon_2024_group_stage['Date'] = pd.to_datetime(afcon_2024_group_stage['Date'],dayfirst=True)
In [ ]:
afcon_2024_group_stage.info()

Exploratry Data Analysis¶

In [7]:
# loading a csv file containing first and second half goals of all the teams

first_second_half_goals = pd.read_csv(r"C:\Users\Felix\Documents\python practice\data science projects\AFCON_2024_ANALYSIS\Files\AFCON GROUP STAGE GAMES 2023_FIRST AND SECOND HALF GOALS.csv")
In [8]:
# Summing the first and second half goals 

total_first_half_goals = first_second_half_goals['first_half_goals'].sum()

total_second_half_goals = first_second_half_goals['second_half_goals'].sum()
In [9]:
# setting up the values and labels for the pie chart

goals = [total_first_half_goals, total_second_half_goals]

labels = ['First Half Goals', 'Second Half Goals']

A total of 88 goals were scored in the group stages of the tournament. Thirty-three goals were scored in the first half of the games, amounting to about 37% whiles fifty-six goals were scored in the second half also amounting to about 63%.

In [10]:
# A pie chart to show the distribution of goals in the first and second half

fig = px.pie(names=labels, values=goals)

fig.update_layout(title="Goals Scored in First and Second Halves by Teams")

fig.show()

The first half of games averaged 0.9 goals per game. Morocco, Cape verde, Equitorial Guinea and South Africa were the highest scoring teams in the first half of games with three goals each. Cameroon, Mozambique, Gambia and Namibia all failed to register a goal in the first half of all their group stage games.

The second half of games averaged 1.5 goals per game. Senegal and Equitorial Guinea scored the most goals in the second half, six goals for both side. Egypt and Cameroon were the second highest scoring teams in second half with fivr goals each, Cape Verde, Mozambique and Angola also scored 4 goals each in the second half.

In [14]:
# bar chart displaying first and second half goals scored by each team

fig = px.bar(first_second_half_goals, x = 'Team', 
             y = ['first_half_goals','second_half_goals'], 
             barmode='group',
             labels={'variable' : 'Half','value' : 'Goals','Team' : 'Team'},
             color_discrete_sequence=['orange','blue'],
             )

fig.update_xaxes(tickangle = 45)

fig.update_traces(texttemplate = '%{y}',
                  textposition = 'inside',
                  )
            
fig.update_layout(title="Goals Scored in First and Second Halves by Teams",
                  bargap = 0.35,
                  template = 'seaborn',
                  height = 500)


fig.show()

Top Scoring Teams in The Group Stages¶

In [12]:
fig = px.bar(first_second_half_goals.sort_values(by='total_goals_scored',ascending=False ), x = 'Team', 
             y = 'total_goals_scored',
             )

fig.update_traces(texttemplate = '%{y}',
                  textposition = 'inside',
                  )

fig.update_xaxes(tickangle = 45)

fig.update_layout(title="Goals Scored in The Group Stages",
                  template = 'seaborn')
fig.show()

Comparison of Goals Scored in The Groups¶

In [13]:
# the number of first and second half goals scored in each group

group_goals = afcon_2024_group_stage.groupby('group_name')[['total_1st_half_goals','total_2nd_half_goals']].sum()

# group_goals = group_goals.sort_values(by='total_2nd_half_goals', ascending=False)

Grouped bar chart to compare the goals scored in the groups¶

In [14]:
fig = px.bar(group_goals, 
             y = ['total_2nd_half_goals','total_1st_half_goals'], #nthe values for the yaxis
             x = group_goals.index, # the labels on the xaxis
             barmode='group',
             labels={'variable' : 'Half','value' : 'Goals','Team' : 'Team'},
             )

fig.update_traces(texttemplate='%{y}',  # Use the y-value of each bar as text
                  textposition='inside')

fig.update_layout(
    title='Comparison of First and Second Half Goals Scored in Each Group',  # Informative title
    xaxis_title='Groups',  # Name for x-axis
    yaxis_title='Number of Goals',  # Name for y-axis
    template='seaborn'
)
fig.show()

Stacked bar chart comparing goals scored in the groups¶

In [15]:
fig = px.bar(group_goals, 
             y = ['total_2nd_half_goals','total_1st_half_goals'],
             x = group_goals.index, 
             barmode='stack',
             labels={'variable' : 'Half','value' : 'Goals','Team' : 'Team'},
             )

fig.update_traces(texttemplate='%{y}',  # Use the y-value of each bar as text
                  textposition='inside')

fig.update_layout(
    title='Comparison of First and Second Half Goals Scored in Each Group',  # Informative title
    xaxis_title='Groups',  # Name for x-axis
    yaxis_title='Number of Goals',  # Name for y-axis
    template='seaborn'
)
fig.show()
In [16]:
# sorting the values of the dataframe using their group names

group_name = afcon_2024_group_stage[['group_name','home_team']].sort_values('group_name')
In [17]:
# making the teams the index of the dataframe

group_name = group_name.set_index('home_team')
In [18]:
# chaning the column with the group names into a dictionary

group_mapping = group_name['group_name'].to_dict()
In [19]:
# adding the group names the dataframe using the map method

first_second_half_goals['group_name'] = first_second_half_goals['Team'].map(group_mapping)
In [40]:
import plotly.graph_objects as go
from plotly.subplots import make_subplots




# Create subplots with 2 rows and 3 columns
fig = make_subplots(
    rows=2, cols=3,
    subplot_titles=first_second_half_goals['group_name'] .unique(),
    shared_yaxes=False # Share the y-axis between subplots
)

# Define uniform colors for each goal type
colors = {'1st Half Goals': 'blue', '2nd Half Goals': 'green', 'Total Goals': 'orange'}

# Track whether a legend entry has been added for each category
legend_added = {'1st Half Goals': False, '2nd Half Goals': False, 'Total Goals': False}

# Create bar charts for each group
row = 1
col = 1

# Loop over the unique groups and create the plots
for group in first_second_half_goals['group_name'].unique():
    group_data = first_second_half_goals[first_second_half_goals['group_name'] == group]

    # Plot for 1st Half Goals
    fig.add_trace(
        go.Bar(
            x=group_data['Team'],
            y=group_data['first_half_goals'],
            name='1st Half Goals',  # Add to legend only once
            marker_color=colors['1st Half Goals'],
            showlegend=not legend_added['1st Half Goals'],  # Only show legend once
            legendgroup='First Half'
        ),
        row=row, col=col
    )
    legend_added['1st Half Goals'] = True

    # Plot for 2nd Half Goals
    fig.add_trace(
        go.Bar(
            x=group_data['Team'],
            y=group_data['second_half_goals'],
            name='2nd Half Goals',  # Add to legend only once
            marker_color=colors['2nd Half Goals'],
            showlegend=not legend_added['2nd Half Goals'],  # Only show legend once
            legendgroup='Second Half'
        ),
        row=row, col=col
    )
    legend_added['2nd Half Goals'] = True

    fig.update_traces(texttemplate='%{y}',  # Use the y-value of each bar as text
                  textposition='inside')

    # Plot for Total Goals
    # fig.add_trace(
    #     go.Bar(
    #         x=group_data['Team'],
    #         y=group_data['total_goals_scored'],
    #         name='Total Goals' if not legend_added['Total Goals'] else None,  # Add to legend only once
    #         marker_color=colors['Total Goals'],
    #         showlegend=not legend_added['Total Goals']  # Only show legend once
    #     ),
    #     row=row, col=col
    
    # legend_added['Total Goals'] = True

    # Update row and column positions for the next group
    col += 1
    if col > 3:
        col = 1
        row += 1

# Update layout
fig.update_layout(
    title_text="Comparison of First and Second Half Goals Scored Across Groups",
    height=800,
    width = 1200,
    showlegend=True,
    barmode='group',  # Stack bars for each team
    template = 'plotly_white'
    
)

# formatting the labels on the xaxis to make the graph look readable.
# the loop applies the changes to all the graphs in the subplot.

for i in range(1, len(first_second_half_goals['group_name'].unique()) + 1):
    fig.update_xaxes(tickangle=45, row=(i - 1) // 3 + 1, col=(i - 1) % 3 + 1)



# formatting the yaxis values for uniformity. 

y_axis_range = [0,6]

# a loop that applies the range of values to all the graphs in the subplot
for i in range(1, len(group)+1):
    fig.update_yaxes(range=y_axis_range, 
                     row=(i - 1) // 3 + 1, 
                     col=(i - 1) % 3 + 1,
                     dtick=1)

# Show the figure
fig.show()